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ABSTRACT 

Motivation: This website allows the detection of horizontal transfers 
based on a combination of parametric methods and proposes an 
origin by researching neighbors in a bank of genomic signatures. 
This bank is also used to research an origin to DNA fragments from 
metagenomics studies. 

Results: Different services are provided like the possibility of 
inferring a phylogenetic tree with sequence signatures or comparing 
two genomes and displaying the rearrangements that happened 
since their separation. 

Availability and implementation: http://gohtam.rpbs.univ-paris- 
diderot.fr/ 

Contact: patrick.deschavanne@univ-paris-diderot.fr; ludovic. mallet 
@jouy.inra.fr 

Supplementary information: Supplementary data are available at 
Bioinformatics online http://gohtam.rpbs.univ-paris-diderot.fr:8080/ 
Data/bin/GOHTAM_bin.tgz 
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1 INTRODUCTION 

Horizontal transfers (HTs) are a major force of evolution (Keeling 
and Palmer, 2008; Ochman et al, 2000) and this website 
proposes methods for their detection. The genomic signature was 
demonstrated to be species-specific (Deschavanne et al, 1999; 
Sandberg et ai, 2001) and allows HT detection in terms of 
tetranucleotide frequencies (Dufraigne et ai, 2005). Parametric 
methods were designed to work only with the information contained 
in genomic sequences. They rely either on the whole set of genes or 
on local variations of genomic signature (Dufraigne et al, 2005; 
Mallet et al, 2010). Recently, a benchmark has determined the 
most efficient parametric methods in different conditions and has 
proposed to use a combination of methods to analyze HTs in 
genomes (Becq et al, 2010). This site provides user-friendly access 
to such methods as well as some unique features including signature- 
based phytogeny and potential origin of a set of metagenomics 
sequences. 
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Fig. 1. Some partial screens of the website. (A) Window-based HT detection; 
(B) table of neighbors; (C) signature-based phylogenetic tree; (D) species 
signature; and (E) genome alignment. 

2 GOHTAM SERVICES 

2.1 HT detection 

The two methods proposed can be used alone or in combination. The 
first is a window-based signature method as described in Dufraigne 
et al (2005), except that the distance used is the Jensen-Shannon 
divergence, a symmetric version of the Kullback-Leibler divergence 
(Azad and Lawrence, 2007; Becq et al, 2010). Either sensitivity 
or specificity can be increased by adjustable classification process 
(Azad and Lawrence, 2007). A gene-based method is also proposed 
with the same distance (Becq etal, 2010). Up to now, these methods 
were never proposed for online genome analysis (Fig. lA). 

2.2 Bank of genomic signatures 

A key feature of GOHTAM is the biggest bank of genomic signatures 
to date. Instead of using only complete genomes (van Passel et al, 
2005; Teeling et al, 2004), this bank is based on the whole 
set of sequences of Genbank (release 188, only sequences <lkb 
were discarded) and contains ^^248 000 tetranucleotide species 
signatures. The bank is updated at each major release. 

2.3 Origin of transferred regions 

Each detected region signature is compared with the signatures of the 
bank and the 10 closest neighbors are displayed with a confidence 
rating depending on the length of both query and reference sequences 
and the distance between the two signatures (Fig. IB). 
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2.4 Metagenomics 

In the case of a metagenomics study, a sequence or a set of 
sequences (multi-Fasta) is loaded; the signatures of these sequences 
are compared as above to propose a species of origin. 

2.5 Oligonucleotide content 

The whole set of tetranucleotides of a sequence represents the 
signature of a sequence (Deschavanne et ai, 1999). This signature 
of the 256 possible tetranucleotides is under the form of a 16 x 16 
frequency matrix and can be displayed as a signature image 
(Fig. ID). 

2.6 Phylogenetic tree of sequence signatures 

It was shown that the species specificity of genomic signatures could 
be used to infer phylogenetic trees (Chapus et al, 2005). Loading 
a multi-Fasta file of sequences leads to build a neighbor-joining 
phylogenetic tree (Fig. IC; Felsenstein, 2005). 

2.7 Genome alignment 

The website uses maximum unique matches (MUMs) to align 
genomes. All rearrangements superiors to Ikb between two 
genomes are graphically displayed with the possibility to choose 
a region or modify the length of MUMs (Fig. IE; Delcher et al. 
1999). 

3 IMPLEMENTATION 

Except for use of programs like the Phylip package (http://evolution 
.gs.washington.edu/phylip.html) or Mummer (http://mummer 
.sourceforge.net/), the original programs are written in Python, Perl 
or R and available at: http://gohtam.rpbs.univ-paris-diderot.fr: 8080/ 
Data/bin/GOHTAM_bin.tgz 

An online help is available. Some analyses require time; HT 
detection lasts ~6 min and the research for neighbors ~2 min 
depending on the server load and the sequence length. 

This site provides some unique features in terms of HT detection, 
origin of HT regions, metagenomics studies as well as for 



phylogenetic analyses of homologous or non-homologous sequences 
due to its extended reference database and improves the analyses 
proposed by other sites of genome analysis (van Passel et al, 2005; 
Teeling et aL, 2004). 
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